A PARALLEL COUNTER AND A LOGIC CIRCUIT FOR 
PERFORMING MULTIPLICATION 



5 Field of the Invention 

The present invention generally relates to digital electronic devices and more 
particularly to a digital electronic device performing binary logic. In one aspect the 
present invention relates to a parallel counter and in another aspect the present invention 
relates to a logic circuit which implements the multiplication of binary numbers. 

10 

Background of the Invention 

It is instrumental for many applications to have a block that adds n inputs of the same 
binary weight together. An output of this block is a binary representation of the number 
of high inputs. Such blocks, called parallel counters (L. Dadda, Some Schemes for 

1 5 Parallel Multipliers, Alta Freq 34: 349-356 (1 965); E. E. Swartzlander Jr., Parallel 
Counters, IEEE Trans. Comput. C-22: 1021-1024 (1973)) (the content of which is 
hereby incorporated by reference), are used in circuits performing binary multiplication. 
There are other applications of a parallel counter, for instance, majority- voting decoders 
or RSA encoders and decoders. It is important to have an implementation of a parallel 

20 counter that achieves a maximal speed. It is known to use parallel counters in 
multiplication (L. Dadda, On Parallel Digital Multipliers, Alta Freq 45: 574-580 
(1976)) (the content of which is hereby incorporated by reference). 



A full adder is a special parallel counter with a three-bit input and a two-bit output. A 
25 current implementation of higher parallel counters i.e. with a bigger number of inputs is 
based on using full adders (C. C. Foster and F. D. Stockton, Counting Responders in an 
Associative Memory, IEEE Trans. Comput. C-20: 1580-1583 (1971)) (the content of 
which is hereby incorporated by reference). In general, the least significant bit of an 
output is the fastest bit to produce in such implementation while other bits are usually 
30 slower. 

The following notation is used for logical operations: 

® - Exclusive OR; 
1 



v-OR; 
A- AND; 
^-NOT. 

An efficient prior art design (Foster and Stockton) of a parallel counter uses full adders. 
5 A full adder, denoted FA, is a three-bit input parallel counter shown in figure 1 . It has 
three inputs Xi, X2, X3, and two outputs S and C. Logical expressions for outputs are 

s = Xiex2ex3, 

C = (XiaX2)v(XiaX3)v(X2aX3). 
A half adder, denoted HA, is a two bit input parallel counter shown in figure 1 . It has 
10 two inputs Xi, X2 and two outputs S and C. Logical expressions for outputs are 

s^x^ex^, 

C = X1AX2. 

A prior art implementation of a seven-bit input parallel counter illustrated in figure 2. 

15 

A paper by Irving T. To and Tien Chi Chen entitled "Multiple Addition by Residue 
Threshold Functions and Their Representation by Array Logic" (IEEE Trans. Comput. 
C-22:762-767 (1973)) (the content of which is hereby incorporated by reference) 
discloses a method of adding together a collection of numbers using exact symmetric 
20 functions to implement residue threshold functions. This arrangement provides some 
improvement in speed over conventional full adders but requires a large increase in area 
due to the need to compute exactly. 

Multiplication is a fondamental operation. Given two n-digit binary numbers 
25 A,.i2°-'+A,.22"-'+. . .+Ai2+Ao and B,_j2"-^+B,_22"-'+. . .+Bi2+Bo, 

their product 

P2,.,2^°-^+P2„.22^-^+...+P,2+Po 
may have up to 2n digits. Logical circuits generating all P, as outputs generally follow 
the scheme in figure 14. Wallace has invented the first fast architecture for a multiplier, 
30 now called the Wallace-tree multiplier (Wallace, C. S., A Suggestion for a Fast 

Multiplier, IEEE Trans. Electron. Comput. EC-13: 14-17 (1964)) (the content of which 
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is hereby incorporated by reference). Dadda has investigated bit behaviour in a 
multiplier (L. Dadda, Some Schemes for Parallel Multipliers, Alta Freq 34: 349-356 
(1 965)) (the content of which is hereby incorporated by reference). He has constructed a 
variety of multipliers and most parallel multipliers follow Dadda 's or Wallace's scheme. 

5 

Dadda's multiplier uses the scheme in on figure 22. If inputs have 8 bits then 64 parallel 
AND gates generate an array shown in figure 23. The AND gate sign a is omitted for 
clarity so that A; aBj becomes A^Bj, The rest of figure 23 illustrates array reduction that 
involves full adders (FA) and half adders (HA). Bits from the same column are added by 
10 half adders or full adders. Some groups of bits fed into a full adder are in rectangles. 
Some groups of bits fed into a half adder are in ovals. The result of array reduction is 
just two binary numbers to be added at the last step. One adds these two numbers by one 
of the fast addition schemes, for instance, conditional adder or carry-look-ahead adder. 



15 Summary of the Invention 

In accordance with a first aspect, the present invention provides a parallel counter based 
on algebraic properties of elementary symmetric functions. Each of the plurality of 
binary output bits is generated as an elementary symmetric function of a plurality of 
binary input bits. 

20 

The elementary symmetric functions comprise logically AND combining sets of one or 
more binary inputs and logically OR or exclusive OR logic combining the logically 
combined sets of binary inputs to generate a binary output. The OR and the exclusive 
OR symmetric functions are elementary symmetric functions and the generated output 

25 binary bit depends only on the number of high inputs among the input binary bits. For 
the OR symmetric function, if the number of high inputs is m, the output is high if and 
only if m >k, where k is the size of the sets of binary inputs. Similarly, the generated 
output binary bit using the exclusive OR symmetric function is high if and only if m >k 
and the number of subsets of inputs of the set of high inputs is an odd number. In one 

30 embodiment the size of the sets can be selected. The i*^ output bit can be generated 

using the symmetric function using exclusive OR logic by selecting the set sizes to be of 



3 



size 1\ where i is an integer from 1 to N, N is the number of binary outputs, and i 
represents the significance of each binary output. 

In one embodiment the sets of binary inputs used in the elementary symmetric fimctions 
5 are each unique and they cover all possible combinations of binary inputs. 

In one embodiment of the present invention, the logic circuit is divided into a plurality 
of logic units. Each logic unit is arranged to generate logic unit binary outputs as a 
symmetric function of the binary inputs to the logic unit. The binary inputs are divided 
10 into inputs into a plurality of the logic units, and the binary outputs are generated using 
binary outputs of a plurality of the logic units. 

This embodiment reduces the amount of fan-out in the circuit and increases the amount 
of logic sharing. It thus makes parallel counters for a large binary number more 
15 practicable. 

In one embodiment of the present invention, the logic circuit is divided into a plurality 
of logic units arranged hierarchically. Each logic unit is arranged to generate logic unit 
binary outputs as an elementary symmetric fixnction of the binary inputs to the logic 
20 unit. Logic units at the or each lower level of the hierarchy are included in the logic of 
logic units at the or each higher level in the hierarchy and have more inputs. 

In a specific embodiment of the present invention, the logic and inputs of the parallel 
counter are divided in accordance with a binary tree. The logic circuit is divided into a 
25 plurality of logic units. Each logic unit is arranged to generate logic unit binary outputs 
as an elementary symmetric function of the binary inputs to the logic unit. The binary 
inputs are divided into inputs into the plurality of logic units, and the binary outputs of 
the plurahty of outputs are generated using binary outputs of a plurality of the logic 
units. 



30 



In a preferred embodiment, each of the logic units is arranged to receive T of the binary 
inputs, where n is an integer indicating the level of the logic units in the binary tree, the 

4 
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logic circuit has m logic units at each level, where m is a rounded up integer determined 
from (the number of binary inputs)/ T, logic units having a higher level in the binary 
tree comprise logic of logic units at lower levels in the binary tree, and each logic unit is 
arranged to generate logic unit binary outputs as an elementary symmetric function of 
the binary inputs to the logic unit. 

In one embodiment, each logic unit at the first level is arranged to generate logic unit 
binary outputs as a smallest elementary symmetric function of the binary inputs to said 
logic circuit. 

In one embodiment, each logic unit at the first level is arranged to generate logic unit 
binary outputs as an elementary symmetric function of the binary inputs to the logic 
circuit using OR logic for combining the binary inputs. 



15 In one embodiment, each logic unit at the first level is arranged to logically AND each 
of the binary inputs to the logic unit and to logically OR each of the binary inputs to the 
logic unit to generate the logic unit binary outputs. 

In one embodiment, each logic unit at the first level is arranged to generate logic unit 
20 binary outputs as an elementary symmetric function of the binary inputs to the logic 
circuit using exclusive OR logic for combining the binary inputs. 

In one embodiment, each logic unit at the first level is arranged to logically AND each 
of the binary inputs to the logic unit and to logically exclusively OR each of the binary 
25 inputs to the logic unit to generate the logic unit binary outputs. 

In one embodiment, elementary logic units are provided as the logic units at the first 
level for performing elementary symmetric functions, outputs from each of two primary 
elementary logic units receiving four logically adjacent binary inputs from said plurality 
30 of inputs are input to two secondary elementary logic units, an output from each of the 
secondary elementary logic units is input to a tertiary elementary logic unit, and the 
primary, secondary and tertiary elementary logic units form a secondary logic unit at a 

5 



second level of the binary tree having a binary output comprising a binary output ftom 
each of the secondary elementary logic units and two binary outputs from the tertiary 
elementary logic unit. 

5 In one embodiment, tertiary logic units at a third level of the binary tree each comprise 
two secondary logic units receiving eight logically adjacent binary inputs from the 
plurality of inputs, four elementary logic units receiving as inputs the outputs of the two 
secondary logic units, and fiirther logic for generating binary outputs as an elementary 
symmetric fixnction of the binary inputs to the tertiary logic unit using the binary outputs 
10 of the four elementary logic units. 

In one embodiment, quaternary logic units at a fourth level of the binary tree each 
comprise two tertiary logic units receiving sixteen logically adjacent binary inputs from 
the plurality of inputs, four elementary logic units receiving as inputs the outputs of the 
15 two tertiary logic units, and fiirther logic for generating binary outputs as an elementary 
symmetric fimction of the binary inputs to the quaternary logic unit using the binary 
outputs of the four elementary logic units. 

In one embodiment, elementary logic units are provided as the logic units at the first 
20 level for performing the smallest elementary symmetric fimctions, and logic units for 
higher levels comprise logic units of lower levels. 

In one embodiment, the logic units for higher levels above the second level comprise 
logic units of an immediately preceding level and elementary logic units. 

25 

In one embodiment, each logic unit at each level is arranged to generate logic unit 
binary outputs as an elementary symmetric function of the binary inputs to the logic 
circuit using OR logic for combining the binary inputs. 

30 In one embodiment, each logic unit at each level is arranged to generate logic unit 
binary outputs as an elementary symmetric fimction of the binary inputs to the logic 
circuit using exclusive OR logic for combining the binary inputs. 

6 



In one embodiment of the present invention, each of the binary outputs can be generated 
using an elementary symmetric function which uses exclusive OR logic. However, 
exclusive OR logic is not as fast as OR logic. 

5 

In accordance with another embodiment of the present invention at least one of the 
binary outputs is generated as an elementary symmetric function of the binary inputs 
using OR logic for combining a variety of sets of one or more binary inputs. The logic 
is arranged to logically AND members of each set of binary inputs and logically OR the 
10 result of the AND operations. 

Thus use of the elementary symmetric function using OR logic is faster and can be used 
for generation of the most significant output bit. In such an embodiment the set size is 
set to be 2^'\ where N is the number of binary outputs and the N^ binary output is the 
15 most significant 

It is also possible to use the elementary symmetric function using OR logic for less 
significant bits on the basis of the output value of a more significant bit. In such a case, 
a plurality of possible binary outputs for a binary output less significant than the N^ are 

20 generated as elementary symmetric functions of the binary inputs using OR logic for 
combining a plurality of sets of one or more binary inputs, where N is the number of 
binary outputs. Selector logic is provided to select one of the possible binary outputs 
based on a more significant binary output value. The size of the sets used in such an 
arrangement for the (N-1)*^ bit is preferably 2^'^ + 2°"^ and 2^"^ respectively and one of 

25 the possible binary outputs is selected based on the N^ binary output value. 

In one embodiment of the present invention the circuit is designed in a modular form. A 
plurality of subcircuit logic modules are designed, each for generating intermediate 
binary outputs as an elementary symmetric function of some of the binary inputs. Logic 
30 is also provided in this embodiment for logically combining the intermediate binary 
outputs to generate binary outputs. 
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In one embodiment of the present invention, the logic units are arranged hierarchically 
and at least one logic unit in at least one level of the hierarchy implements an inverted 
elementary symmetric function. In one arrangement, the logic units at an odd number of 
levels in the hierarchy implement inverted elementary symmetric fimctions, logic units 
5 at an even number of levels in the hierarchy implement symmetric functions, and the 
inputs to the logic units at the first level of the hierarchy are inverted. In another 
arrangement logic units at an even number of levels in the hierarchy implement inverted 
elementary symmetric functions, logic units at an even number of levels in the hierarchy 
implement symmetric functions, and the inputs to the logic units at the first level of the 
10 hierarchy are input to logic units in a first level in the hierarchy uninverted. This 

embodiment of the present invention allows faster inverting logic gates to be used in the 
logic circuit. 

Since OR logic is faster, in a preferred embodiment the subcircuit logic modules 
15 implement the elementary symmetric functions using OR logic. In one embodiment the 
subcircuit modules can be used for generating some binary outputs and one or more 
logic modules can be provided for generating other binary outputs in which each logic 
module generates a binary output as an elementary symmetric function of the binary 
inputs exclusive OR logic for combining a plurality of sets of one or more binary inputs. 

20 

Another aspect of the present invention provides a method of designing a logic circuit 
comprising: providing a library of logic module designs each for performing a small 
elementary symmetric function; designing a logic circuit to perform a large elementary 
symmetric function; identifying small elementary symmetric functions which can 

25 perform said elementary symmetric function; selecting logic modules from said library 
to perform said small elementary symmetric fimctions; identifying a logic circuit in the 
selected logic circuit which performs an elementary symmetric function and which can 
be used to perform another elementary symmetric function; selecting the logic circuit 
corresponding to the identified elementary symmetric function and using the selected 

30 logic circuit with inverters to perform said other elementary symmetric function using 
the relationship between the elementary symmetric functions: 

OR_n_k(Xi...XJ = ^OR_n_(n+l-k)(^Xi...^X,) 
8 



where -i denotes an inversion, n is the number of inputs, and k is the number of sets of 
inputs AND combined together. 

Another aspect of the present invention provides a conditional parallel counter having m 
5 possible high inputs out of n inputs, where m<n, and n and m are integers, the counter 
comprising the parallel counter for counting inputs to generate p outputs for m inputs, 
wherein the number n of inputs to the counter is greater than 2^, where p is an integer. 

Thus these aspects of the present invention provide a fast circuit that can be used in any 
10 architecture using parallel counters. The design is applicable to any type of technology 
from which the logic circuit is built. 

The parallel counter in accordance with this aspect of the present invention is generally 
applicable and can be used in a multiplication circuit that is significantly faster than 
1 5 prior art implementations. 

One aspect of the present invention provides a conditional parallel counter having m 
possible high inputs out of n inputs, where m<n, and n and m are integers. The 
conditional parallel counter comprises the parallel counter as described hereinabove for 
20 counting inputs to generate p outputs for m inputs, wherein the number n of inputs to the 
counter is greater than 2^. The conditional multiplier can be used in a digital filter for 
example. 

In accordance with another aspect of the present invention a technique for multiplying 
25 binary numbers comprises an array generation step in which an array of logical 
combinations between the bits of the two binary numbers is generated which is of 
reduced size compared to the prior art. 

In accordance with this aspect of the present invention, a logic circuit for multiplying 
30 two binary numbers comprises array generation logic for performing a logical binary 
operation between each bit in one binary number and each bit in the other binary 
number to generate an array of logical binary combinations comprising an array of 
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binary values, and for further logically combining logically adjacent values to reduce 
the maximum depth of the array to below N bits, where N is the number of bits of the 
largest of the two binary numbers; array reduction logic for reducing the depth of the 
array to two binary numbers; and addition logic for adding the binary values of the two 
5 binary numbers. 

In one embodiment, when two binary numbers are multiplied together, as is 
conventional, each bit Aj of the first binary number is logically combined with each bit 
Bj- of the second number to generate the array which comprises a sequence of binary 
10 numbers represented by the logical combinations, and Bj. The further logical 
combinations are carried out by logically combining the combinations Ai and B^^^^ 
and Bn_i, A^ and B^.s, and A^ and B^.j, where N is the number of bits in the binary 
numbers. In this way the size of the maximal column of numbers to be added together 
in the array is reduced. 

15 

More specifically the array generation logic is arranged to combine the combinations Ai 
AND B„.2 and Ao AND B^.j using exclusive OR logic to replace these combinations and 
to combine AND B^^.j and A<, AND B^.2 to replace the Ai AND B^.i combination, 

20 In one embodiment of the present invention the array reduction logic can include at least 
one of: at least one fiill adder, at least one half adder, and at least one parallel counter. 
The or each parallel counter can comprise the parallel counter in accordance with the 
first aspects of the present invention. 

25 This aspect of the present invention provides a reduction of the maximal column length 
in the array thereby reducing the number of steps required for array reduction. When 
the first aspect of the present invention is used in conjunction with the second aspect of 
the present invention, an even more efficient multiplication circuit is provided. 

30 One embodiment of the present invention provides a multiply-accumulate logic circuit 
comprising the logic circuit as described hereinabove, wherein said array generation 
logic is arranged to include an accumulation of previous multiplications. 

10 



Another aspect of the present invention provides a logic circuit comprising at least four 
inputs for receiving a binary number as a plurality of binary inputs; at least one output 
for outputting binary code; and logic elements cormected between the plurality of inputs 
5 and the or each binary output and for generating the or each binary output in accordance 
with a threshold function implemented as a binary tree and having a threshold of at least 
2. A threshold function is a function which is high if, and only if, at least a threshold 
number k of the inputs are high, where k>2. 

10 In one embodiment of this aspect of the present invention, the logic elements are 

arranged to generate the or each binary output as an elementary symmetric function of 
the binary inputs i.e. the threshold function is implemented as an elementary symmetric 
function. 

15 Another aspect of the present invention provides a logic circuit comprising at least four 
inputs for receiving a binary number as a plurality of binary inputs; at least one output 
for outputting binary code; and logic elements cormected between the plurality of inputs 
and the plurality of binary outputs arranged to generate the or each of the plurality of 
binary outputs as an elementary symmetric function of the binary inputs. 

20 

A further aspect of the present invention provides a method and system for designing a 
logic circuit comprising a plurality of inputs for receiving a binary number as a plurality 
of binary inputs, at least one output for outputting binary code, and logic elements 
cormected between the plurality of inputs and the or each binary output and arranged to 

25 generate the or each binary output as a threshold function of the binary inputs. The 
method comprises determining logic elements for performing the threshold functions; 
and reducing the logic elements by identifying logic elements performing a logical AND 
of two threshold functions and reducing the identified logic elements to logic elements 
for performing the threshold fiinction having the higher threshold, and identifying logic 

30 elements performing a logical OR of two threshold functions and reducing the identified 
logic elements to logic elements for performing the threshold function having the lower 
threshold. 



This aspect of the present invention can be implemented in software using a computer 
system comprising one or multiple networked computers. The invention thus 
encompasses program code for controlling a computer system. The code can be 
provided to the computer system on any suitable carrier medium such as a storage 
medium e.g. a floppy disk, hard disk, CD ROM, or programmable memory device, or a 
transient medium e.g. an electrical, optical, microwave, acoustic, or RF signal. An 
example of a transient medium is a signal carrying the code over a network such as the 
Internet. 

A further aspect of the present invention provides a method and system for designing a 
logic circuit comprising a plurality of inputs for receiving a binary number as a plurality 
of binary inputs, at least one output for outputting binary code, and logic elements 
connected between the plurality of inputs and the binary outputs and arranged to 
generate each binary output as a symmetric function of the binary inputs. The method 
comprises designing the logic circuit using exclusive OR logic; identifying any logic 
which cannot have inputs that are high at the same time; and replacing the identified 
exclusive OR logic with OR logic. 

In one embodiment of this aspect of the present invention, the logic circuit is designed 
to generate each binary output as an elementary symmetric function of the binary inputs. 

In a specific embodiment of this aspect of the present invention, the logic circuit 
comprises a parallel counter. 

This aspect of the present invention can be implemented in software using a computer 
system comprising one or multiple networked computers. The invention thus 
encompasses program code for controlling a computer system. The code can be 
provided to the computer system on any suitable carrier medium such as a storage 
medium e.g. a floppy disk, hard disk, CD ROM, or programmable memory device, or a 
transient medium e.g. an electrical, optical, microwave, acoustic, or RF signal. An 
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example of a transient medium is a signal carrying the code over a network such as the 
Internet. 

A further aspect of the present invention provides a method and system for designing a 
5 logic circuit comprising providing a library of logic module designs each for performing 
a small symmetric function; designing a logic circuit to perform a large symmetric 
function; identifying small symmetric functions which can perform said symmetric 
function; selecting logic modules from said library to perform said small symmetric 
functions; identifying a logic circuit in the selected logic circuit which performs a 

1 0 symmetric function and which can be used to perform another symmetric fimction; and 
selecting the logic circuit corresponding to the identified symmetric function and using 
the selected logic circuit with inverters to perform said other symmetric function using 
the relationship between the symmetric functions: 

OR__n_k(X,...XJ - ^OR_nJn+l-k)(^Xi...^XJ 

15 where denotes an inversion^ n is the number of inputs, and k is the number of sets of 
inputs AND combined together. 

In one embodiment of this aspect of the present invention, the symmetric functions are 
elementary symmetric fiinctions. 

20 

This aspect of the present invention can be implemented in software using a computer 
system comprising one or multiple networked computers. The invention thus 
encompasses program code for controlling a computer system. The code can be 
provided to the computer system on any suitable carrier medium such as a storage 
25 medium e.g. a floppy disk, hard disk, CD ROM, or programmable memory device, or a 
transient medium e.g. an electrical, optical, microwave, acoustic, or RF signal. An 
example of a transient medium is a signal carrying the code over a network such as the 
Internet. 
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Brief description of the drawings 

Embodiments of the present invention will now be described with reference to the 
accompanying drawings, in which: 

Figure 1 is a schematic diagram of a full adder and a half adder in accordance with the 
5 prior art. 

Figure 2 is a schematic diagram of a parallel counter using full adders in accordance 
with the prior art, 

Figure 3 is a schematic diagram illustrating the logic modules executing the symmetric 

functions for the generation of binary outputs and the multiplexor (selector) used for 
10 selecting outputs. 

Figure 4 is a diagram illustrating the logic for implementing the symmetric function 

0R_3_1 according to one embodiment of the present invention. 

Figure 5 is a diagram illustrating the logic for implementing the symmetric function 

0R_4_1 according to one embodiment of the present invention, 
15 Figure 6 is a diagram illustrating the logic for implementing the symmetric function 

0R_5_1 using 2 3 input OR gates according to one embodiment of the present 

invention, 

Figure 7 is a diagram illustrating the logic for implementing the symmetric function 
EXOR_7_l using two input exclusive OR gates according to one embodiment of the 

20 present invention, 

Figure 8 is a diagram illustrating the logic for implementing the symmetric function 
OR_3_2 according to one embodiment of the present invention, 
Figure 9 is a diagram illustrating the logic for implementing the symmetric fonction 
EXOR_5_3 according to one embodiment of the present invention, 

25 Figure 1 0 is a diagram illustrating a parallel counter using the two types of symmetric 
functions and having seven inputs and three outputs according to one embodiment of the 
present invention, 

Figure 1 1 is a diagram illustrating splitting of the symmetric function OR_7_2 into sub 
modules to allow the reusing of smaller logic blocks according to one embodiment of 
30 the present invention, 

Figure 12 is a diagram of a parallel counter using the EX0R__7_1 symmetric function 
for the generation of the least significant output bit from all of the input bits, and smaller 



modules implementing symmetric functions using OR logic to generate the second and 
third output bits according to one embodiment of the present invention, 
Figure 13 is a another diagram of a parallel counter similar to that of Figure 12 accept 
that the partitioning of the inputs is chosen differently to use different functional sub 
5 modules according to one embodiment of the present invention. 

Figure 14 is a diagram schematically illustrating the binary tree organisation of the logic 
in a parallel counter according to a second aspect of the invention, 
Figure 15 is a diagram illustrating the logic block (Block 1) for implementing the 
elementary symmetric functions OR_2_2 and 0R_2_1 according to one embodiment of 
1 0 the present invention. 

Figure 16 is a diagram illustrating the logic block (Block 2) for implementing the 
secondary symmetric functions OR_4_4, OR_4_3, OR_4_2 and 0R_4_1 according to 
one embodiment of the present invention, 

Figure 17 is a diagram illustrating the logic block (Block 3) for implementing the 
15 tertiary symmetric functions OR_8_8, OR_8_7, OR_8__6, OR__8_5, OR_8_4, OR_8_3, 

OR_8__2 and 0R_8_1 according to one embodiment of the present invention. 

Figure 18 is a diagram illustrating the logic block (Block 4) for implementing the 

symmetric functions OR_15_12, OR_15_8 and OR_15_4 according to one embodiment 

of the present invention, 
20 Figure 19 is a diagram illustrating the logic block (Block 5) for implementing the 

elementary symmetric functions EXOR_4_2 and 0R_4_1 according to one embodiment 

of the present invention, 

Figure 20 is a diagram illustrating the logic block (Block 6) for implementing the 
elementary symmetric functions EXOR_15_2 and 0R_15_1 according to one 

25 embodiment of the present invention, 

Figure 21 is a diagram schematically illustrating a parallel counter using the logic 
blocks of Figures 15 to 20 according to one embodiment of the present invention. 
Figure 22 is a diagram illustrating a hierarchical structure for logic units in accordance 
with an embodiment of the present invention, 

30 Figure 23 is a diagram illustrating another hierarchical structure for logic units in 
accordance with an embodiment of the present invention, 
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Figure 24 is a diagram illustrating a further hierarchical structure for logic units in 
accordance with an embodiment of the present invention, 

Figure 25 is a diagram illustrating the hierarchical organisation of logic units in a tree 
structure to implement the elementary symmetric function OR_8_4 in accordance with 
5 an embodiment of the present invention, 

Figure 26 is a diagram of the logic for a high speed implementation of the first level the 
circuit of figure 25, 

Figure 27 is a diagram of the logic for a high speed implementation of the second level 
the circuit of figure 25, 

10 Figure 28 is a diagram of the logic for a high speed implementation of the third level the 
circuit of figure 25, 

Figure 29 is a diagram of the steps used in the prior art for multiplication, 
Figure 30 is a schematic diagram of the process of Figure 29 in more detail, 
Figure 31 is a diagram illustrating the properties of diagonal regions in the array, 
15 Figure 32 is a diagram illustrating array deformation in accordance with the 

embodiment of the present invention and the subsequent steps of array reduction and 
adding. 

Figure 33 is a diagram of logic used in this embodiment for array generation, and 
Figure 34 is a diagram of a logic circuit for generating an output as a threshold function. 

20 

Detained description of preferred embodiments 

A first aspect of the present invention will now be described. 

The first aspect of the present invention relates to a parallel counter counting the number 
25 of high values in a binary number. The counter has i outputs and n inputs where i is 
determined as being the integer part of log2 n plus 1 

A mathematical basis for the first aspect of the present invention is a theory of 
symmetric functions. We denote by C\the number of distinct k element subsets of a set 
30 of n elements. We consider two functions EXOR_n_k and OR_n_k of n variables X^, 
X2, ... X^ given by 

EXOR_n_k(Xi, X2, . . . X,) = e (X,, A Xi2 A . . . aX^, 
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OR_n_k(X,, X^, ... X,) = V (X,, A A ... aX^ 
where (il , i2, . . . ik) runs over all possible subsets of {Xj, X2, . . . X,,} that contain 
precisely k elements. Blocks that produce such outputs are shown on figure 3. 



5 The functions EXOR_n_k and OR_n_k are elementary symmetric functions. Their 
values depend only on the number of high inputs among X^, X2, X3, . . . X^. More 
precisely, if m is the number of high inputs among Xj, X^, X3, . . . X^ then OR_n_k(Xi, 
X2, . . . XJ is high if and only if m>k. Similarly, EXOR_n_k(Xj, X^, . . . X,) is high if 
and only if m>k and C\ is odd. 

10 

Although EXOR_n_k and OR_n_k look similar, OR_n_k is much faster to produce 
since EXOR-gates are slower than OR-gates. 

In the above representation n is the number of inputs and k is the size of the subset of 
15 inputs selected. Each set of k inputs is a unique set and the subsets comprise all 

possible subsets of the set of inputs. For example, the symmetric function 0R_3_1 has 
three inputs Xi, X2 and X3 and the set size is 1 . Thus the sets comprise X^, X2 and X3. 
Each of these sets is then logically OR combined to generated the binary output. The 
logic for performing this function is illustrated in Figure 4. 

20 

Figure 5 illustrates the logic for performing the symmetric 0R_4_1 . 

When the number of inputs become large, it may not be possible to use simple logic. 

25 Figure 6 illustrates the use of two OR gates for implementing the symmetric function 
0R_5_1. 

Figure 7 similarly illustrates the logic for performing EX0R_7_1. The sets comprise 
the inputs Xj, X2, X3^ X4, X5 ,X^, and X7. These inputs are input into three levels of 
3 0 exclusive OR gates . 
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When k is greater than 1, the inputs in a subset must be logically AND combined. 
Figure 8 illustrates logic for performing the symmetric function OR_3_2. The inputs 
and X2 comprise the first set and are input to a first AND gate. The inputs and X3 
constitute a second set and are input to a second AND gate. The inputs X2 and X3 
5 constitute a third set and are input to a third AND gate. The output of the AND gates 
are input to an OR gate to generate the output function. 

Figure 9 is a diagram illustrating the logic for performing the symmetric function 
EXOR_5_3. To perform this function the subsets of size 3 for the set of five inputs 
10 comprise ten sets and ten AND gates are required. The output of the AND gates are 
input to an exclusive OR gate to generate the function. 

The specific logic to implement the symmetric functions will be technology dependent. 
Thus the logic can be designed in accordance with the technology to be used. 

15 

In accordance with a first embodiment of the present invention the parallel counter of 
each output is generated using a symmetric function using exclusive OR logic. 

Let the parallel counter have n inputs Xi, . . . X^ and t+1 outputs S^, S^.i, ... Sq. Sq is the 
20 least significant bit and St is the most significant bit. For all i from 0 to t, 

Si=EX0R_n_2XXi, X^, ... XJ. 

It can thus be seen that for a seven bit input i.e. n=7, i will have values of 0, 1 and 2. 
Thus to generate the output Sq the fonction will be EX0R_7_1, to generate the output Sj 
25 the function will be EX0R_7_2 and to generate the output S2 the function will be 

EXOR_7_4. Thus for the least significant bit the set size (k) is 1, for the second bit the 
set size is 2 and for the most significant bit the set size is 4. Clearly the logic required 
for the more significant bits becomes more complex and thus slower to implement. 

30 Thus in accordance with a second embodiment of the present invention, the most 
significant output bit is generated using a symmetric function using OR logic. 



18 



This is more practical since OR_n_k functions are faster than EXOR_n_k functions. For 
the most significant output bit 

Sk=0R_n_2XX,,X„...XJ. 
In particular, with a seven-bit input 

S,=OR_7_4(X„ X^, X3, X„ X„ X,, Xj). 

Thus in this second embodiment of the present invention the most significant bit is 
generated using symmetric functions using OR logic whereas the other bits are 
generated using symmetric functions which use exclusive OR logic. 

A third embodiment will now be described in which intermediate bits are generated 
using symmetric functions using OR logic. 

An arbitrary output bit can be expressed using OR_n_k functions if one knows bits that 
are more significant. For instance, the second most significant bit is given by 

St,i - (S, A OR_n_M-^) V ((^St) a 0R_n_2*-'). 
In particular, with a seven-bit input 

Si = (S, A OR_7_6(X,, X3, X3, X4, X5, X,, X,)) V 
((^S,) A OR_7_2(X„ X„ X3, X4, X5, X,, X,)). 
A further reduction is 

St = OR_7_6(X,, X2, X3, X4, X5, X,, X,) V 
((^82) A OR_7_2(Xi, X2, X3, X4, X5, X,, X,)). 
A multiplexer MU, shown in figure 3, implements this logic. It has two inputs X^, Xi, a 
control C, and an output Z determined by the formula 

Z - (C A Xi) V ((~nC) A Xo). 

It is not practical to use either EXOR_n_k functions or OR_n_k functions exclusively. It 
is optimal to use OR_n_k functions for a few most significant bits and EXOR_n_k 
functions for the remaining bits. The fastest, in TSMC.25, parallel counter with 7 inputs 
is shown in figure 10. 
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Future technologies that have fast OR_15_8 blocks would allow building a parallel 
counter with 15 inputs. A formula for the third significant bit using OR_n_m functions 
is thus: 

S,_2 = (S, A St.iA OR„n_2*+2*-'+2*-') v (S^ a (^S^.i) a OR_n_2'+2*-') v 
((^SO A St.i A OR_n__2*"'+2*-') v ((-nSJ a (^S^.i) a 0R_n_2*-'). 

A fourth embodiment of the present invention will now be described which divides the 
logic block implementing the symmetric function into small blocks which can be 
reused. 



An implementation of OR_7_2 is shown in figure 1 1 . The 7 inputs are split into two 
groups: five inputs from to X5 and two remaining inputs Xg and Xj. Then the 
following identity is a basis for the implementation in figure 1 1 . 

OR_7_2(Xi,... ,X7) = OR_5_2(Xi, ,..,X,)v 
1 5 (OR_5_l (Xj, . . . X5) A OR„2_l (X,, X^)) v OR_2_2(X6, X,) 

One can write similar formulas for OR_7_4 and OR_7_6. Indeed, 
OR_7_4(Xi,... , X,) = OR_5_4(Xi, X5) v 
(OR_5_3(Xi, ... X5) A OR_2_l(X6, X^)) v 
(OR_5_2(X,, X5) A OR_2__2(X,, X,)), 
20 OR_7_6(Xi, ,,„X,)^ 

(OR_5__5(Xi, . . . X5) A 0R_2_1(X,, X^)) v 
(OR_5_4(Xj, ... X5) A OR_2_2(X,, X^)). 
Thus, it is advantageous to split variables and reuse smaller OR_n_k functions in a 
parallel counter. For instance, an implementation of a parallel counter based on 
25 partitioning seven inputs into groups of two and five is in figure 12. 

Similarly, one can partition seven inputs into groups of four and three. An 
implementation of the parallel counter based on this partition is in figure 13. One uses 
the following logic formulas in this implementation. 
30 OR_7_2(Xi, . . . , X7) = OR_4_2(Xi, X2, X3, XJ v 

(0R„4__1(X,, X,, X3, X4) A OR_3_l(X5, X,, X,)) v OR„3_2(X3, X,, X,), 

20 



OR_7_4(X,, . . X,) - ORJJiX,, X,, X,, XJ v 
(OR_4_3(Xi, X,, X3, X,) A 0RJ„1(X5, X,, X^)) v 
(OR_4_2(Xi, X,, X3, X,) A OR_3_2(X5, X,, X,)) v 

(0R_4_l(Xi, X„ X3, X4) A OR_3_3(X5, X,, X,)), 
5 OR_7_6(Xi, ...,X7) = 

(OR_4_4(Xi, X,, X3, X,) A OR J_2(X5, X,, X,)) v 

(0R„4 J(Xi, X,, X3, X4) A OR_3_3(X5, X,, X,)). 

One needs a method to choose between the implementations in figures 12 and 13. Here 
10 is a pneumonic rule for making a choice. If one or two inputs arrive essentially later 
then one should use the implementation on figure 12 based on partition 7=5+2. 
Otherwise, the implementation on figure 13 based on partition 7=4+3 is probably 
optimal. 

15 Parallel counters with 6, 5, and 4 inputs can be implemented according to the logic for 
the seven input parallel counter. Reducing the number of inputs decreases the area 
significantly and increases the speed slightly. It is advantageous to implement a six 
input parallel counter using partitions of 6, 3 + 3 or 4 + 2. 

20 A preferred embodiment of the present invention will now be described with reference 
to figures 14 to 21. 

Although it is possible to implement any OR_n_k or EXOR_n_k function in two levels 
of logic, the fan-out of each input is very high and the fan-in of the OR gate is also very 
25 high. It is known that both high fan-out and high fan-in contribute significantly to the 
delay of the circuit. It is often required that more than one OR_n_k or EXOR n k 
fimction be computed from the same inputs. A two level implementation does not allow 
sharing of logic thus resulting in high area. 

30 This embodiment of the present invention uses the binary tree splitting of the inputs and 
the logic to reduce fan-out and enable reuse of logic. Figure 14 illustrates schematically 
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the organisation of the logic. At a first level 8 elementary logic blocks 1 are used each 
having two of the binary inputs and providing 2 outputs. The elementary logic blocks 1 
of the first level perform elementary symmetric functions. These can either be exclusive 
OR symmetric functions or OR symmetric functions. At the second level four secondary 
5 logic blocks 2 each use the logic of two elementary logic blocks 1 and hence have four 
inputs and four outputs. The secondary logic blocks 2 perform larger symmetric 
functions. At the third level two tertiary logic blocks 3 each use the logic of two 
secondary logic blocks 2 and hence have eight inputs and eight outputs. The tertiary 
logic blocks 3 perform larger symmetric functions. At the fourth level the parallel 
1 0 counter 4 uses the logic of two tertiary logic blocks 3 and hence has sixteen inputs and 
sixteen outputs. 

As can be seen in figure 14, the binary tree arrangement of the logic enables the logic 
for performing smaller symmetric functions to be used for the parallel counter. Also the 
15 arrangement provides for significant logic sharing. This significantly reduces fan-out . 

As will be described in more detail, it is also possible to provide further logic sharing by 
using the elementary symmetric function logic for combining outputs of previous logic 
blocks in the binary tree. 

20 

The functions OR_16_8, OR_16_4 and OR_16_12 are constructed fi:om the set of 
inputs Xi, X2...X16. Although, the embodiment is described with OR_n_k fimctions the 
same construction appUes to EXOR_n_k fimctions after replacing every OR gate by an 
EXOR gate. 

25 

The principles behind this embodiment of the invention will now be described. The 
function OR_(r+s)_t can be computed as the OR of the functions ORrk '^OR_s_t-k as 
t runs through 0,l,2...k, 

0RJr+s)_t(Xi...X^3) - vj^o' [OR__r_k(Xi...X,) ^ OR„s_(t-k)(X^i...X^,)]. 

30 

In an embodiment with 16 inputs, at a first level the 16 inputs are divided into 8 subsets 
-{X i,X2}, {X3,X4},...,{Xi5,XiJ, each subset containing two inputs. For each subset a 
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logic block 1 that computes 0R_2_1 and OR_2_2 is constructed. The 8 blocks form the 
first level of the tree. Since each input fans out into an OR gate and an AND gate we see 
that each input has a fan-out of two. Also the first layer is very regular consisting of 8 
identical blocks. The logic block 1 for computing the symmetric functions 0R_2_1 and 
5 OR_2_2 is illustrated in figure 1 5 . 

At a second level, 4 logic blocks 2 are formed by combining outputs fi:om two adjacent 
logic blocks 1 at level one. These 4 blocks comprise the second layer of the tree. Each 
block has as inputs the outputs of two adjacent blocks firom level one. The inputs are 
1 0 combined to form the functions 0R_4_1 , OR_4_2, OR_4_3, OR_4_4. The logic block 2 
for computing these sjTnmetric functions is illustrated in figure 16. The indices 1 and 2 
are used in the equations below to distinguish functions formed on different subsets of 
the set of inputs. The symmetric fimctions can be represented as: 



At a third level, 2 logic blocks 3 are formed by combining outputs from two adjacent 
20 logic blocks 2 at level two. These 2 blocks comprise the third layer of the tree. Each 
block has as inputs the outputs of two adjacent blocks from level two. The inputs are 
combined to form the fimctions 0R_8_1, OR_8_2, OR_8_3, OR_8_4, OR_8_5, 
OR_8_6, OR_8_7 and OR_8_8. The logic block 3 for computing these symmetric 
functions is illustrated in figure 17. The symmetric fimctions can be represented as: 
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0R_4_1 = [0R_2_l]i V [OR_2_l]2, 
OR_4_2 = ([0R_2_1],^ [OR_2_l]2) v ([OR_2_2]i+[OR_2_2]2), 
OR_4_3 = ([0R_2_1],^ [OR_2_2]2) v ([OR_2_2]i^ [OR_2_l]2), 
OR_4_4 = [OR_2_2]i^[OR_2_2]2. 
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0R_8_1 = [0R_4_l]i V [OR_4_l]2, 
OR_8_2 = ([OR_4_l],^[OR_4_l]2) v [OR_4_2], v [OR_4_2]2, 
OR_8_3 = ([OR_4_l],^[OR_4_2]2) v 
([OR_4_2],^[OR_4_l]2) V [OR_4_3], v [OR_4_3]2, 
OR_8_4 = ([OR_4_l]i^[OR_4_3]2) v ([OR_4_2]i^[OR_4_2]2) v 

([OR_4_3],^[OR_4_l]2) V [OR_4_4], v [OR_4_4]2, 
OR_8_5 = ([OR_4_l]i^[OR_4_4]2) v ([OR_4_2]i^[OR_4_3]2) v 
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([OR_4_3]i^[OR_4_2]2) v ([OR_4_4]i^[OR_4_l]2), 

OR_8_6 = ([OR_4_2],^[OR_4_4]2) v 
([OR_4_3]i^[OR_4_3]2) v ([OR_4_4],^[OR_4_2]2), 
OR_8_7 = ([OR_4_3],^[OR_4_4]2) v ([OR_4_4],^[OR_4_3]2), 
5 OR_8_8 = [OR_4_4]i^[OR_4_4]2. 

At the final level, 3 outputs are formed by combining outputs from the two adjacent 
logic blocks 3 at level 3. This logic comprises the third layer of the tree. Outputs of the 
two adjacent blocks from level three are combined to form the fimctions OR_16_8, 
1 0 OR l 6 4, and 0R_1 612. The logic block 4 for computing these symmetric functions 
is illustrated in figure 18. The symmetric functions can be represented as: 

OR_16_4 = ([OR_8_l]i^[OR_8_3]2) v ([OR_8_2]i^[OR_8_2]2) v 
([OR_8_3]i^[OR_8_l]2) V [OR_8_4], v [OR_8_4]2, 
15 OR_16_8 = ([OR_8_l],^[OR_8_7]2) v ([OR_8_2]i^[OR_8_6]2) v 

([OR_8_3]i^[OR_8_5]2) v ([OR_8_4]i^[OR_8_4]2) v ([OR_8_5],^[OR_8_3]2) v 
([OR_8_6],^[OR_8_2]2) v ([OR_8_7],^[OR_8_l]2) v [OR_8_8], v [OR_8_8]2, 

OR_16_12 = ([OR_8_4]i^[OR_8_8]2) v ([OR_8_5]i'^[OR_8_7]2) v 
([OR_8_6],^[OR_8_6]2) v ([OR_8_7],^[OR_8_5]2) v ([OR_8_8],^[OR_8_4]2). 

20 

Whilst it is possible in accordance with the invention to generate all of the outputs of the 
parallel counter using the outputs of the logic blocks 3, it is advantageous to determine 
the two least significant bits separately in parallel. This is illustrated in figures 19 and 
20. Although this increases fan-out slightly, it decreases the depth of the tree thus 
25 increases the speed of the circuit. 

Figure 19 is a diagram of a logic block 5 for determining the symmetric functions 
EXOR_4_2 and EX0R_4_1 . In the determination of EXOR_4_2 the faster OR gate 
replaces am EXOR gate, according to: 
30 EXOR_4_2 = ([OR_2_l],^[OR_2_l]2)e[OR_2_2],®[OR_2_2]2= 

= ([OR_2_l],^[OR_2_l]2) V ([OR_2_2],©[OR_2_2]2). 
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Four of these logic blocks 5 are provided to take the 16 inputs. Thus the logic block 5 
can be considered to be a combined level 1 and 2 implementation. 

In figure 19 the final logic gate is not and EXOR gate but an OR gate. This is because 
5 both of the inputs to the gate cannot be high at the same time i.e. AB=0 and thus the 
relationship A©B= AvB holds. Thus in accordance with one aspect of the present 
invention, faster OR gates can be included in the design of a logic circuit by identifying 
situations where this relationship holds. This process can be performed automatically by 
a computer program during logic circuit design. 

10 

Figure 20 is a diagram of a logic block 6 for determining the symmetric functions 
EXOR_15__2 and EX0R_15_1 which comprise the least two significant bits output from 
the parallel counter of this embodiment. This logic block comprises level 3 in the binary 
tree and it uses four of the logic blocks 5. Thus even in this parallel determination of the 
15 least significant two bits, there is reuse of logic using the binary tree structure. 

Figure 21 is a diagram of the parallel counter of this embodiment of the invention in 
which the logic of block 4 is used to determine the most significant bits and the logic of 
block 6 is used to determine the least significant bits. 

20 

In the logic blocks illustrated in figures 16, 17 and 18, it can be seen that in addition to 
sharing logic for the inputs, the outputs of the elementary logic blocks, the secondary 
logic blocks and the tertiary logic blocks are input into elementary logic blocks thus 
providing further logic sharing. The reason for this is that OR fimctions are not 

25 independent. Assuming that k>s, 

OR_n_k A OR__n_s = OR_n_k, 1 

OR_n_k V OR_n_s = OR_n_s 2 

This shows that there are possible redundant AND and OR logical operations in the 
multiplexing operation for the outputs of logic performing small elementary symmetric 

30 functions to implement large elementary symmetric functions. 
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These formulas result in significant reductions in logic for parallel counter. The first 
instance of such a reduction is the following formula for the second most significant bit 
of a parallel counter, 

S,_i = OR__n_(2^+2*-^) V hOR_n_2^ a 0R_n_2^-^]. 

5 

For example, in the circuit of figure 10, the multiplexor (MU) generates Si using 
OR(7,4), OR(7,6) and OR(7,2). In unreduced for the logic to generate is: 

Si = [OR(7,4) A OR(7,6)] v hOR(7,4) a 0R(7,2)] 
Using the equations 1 and 2 above, this reduces to: 
10 Si = 0R(7,6) V hOR(7,4) a 0R(7,2)] 

It can thus be seen that the function OR(7,4) is redundant in the determination of S^ 

In the circuit of figure 21, the multiplexing in performed by the three logic gates. In 
unreduced form the output S2 would be: 
15 82 = [OR(15,8) A OR(15,12)] v [^OR(15,8) a 0R(15,4)] 

Using the equations 1 and 2 given above, this reduces to: 

S2 = OR(15,12) V hOR(15,8) a 0R(15,4)] 
This is the logic illustrated in figure 21 comprising the three logic gates for combining 
the outputs of block 4 i.e. an inverter, an AND gate and an OR gate. 

20 

Thus this process of reduction can be implemented during the logic circuit design 
process to identify where the relationship given in equations 1 and 2 hold thus enabling 
a reduction in logic to be implemented. 

25 To show the second instance of such a reduction, it is assumed that k>s, 
([OR_n_k]iA[OR_m_s]2) v ([OR_m_s]iA [OR__n_k]2) = 
[OR„m_s],A[OR_m_s]2A([OR_n_k]i v [OR_n_k]2). 
These formulas allow the reduction of fan-out by sharing certain logic. As shown on 
block 2, the functions OR_4_2 and OR_4_3 are implemented by three levels of shared 
30 logic, 

OR_4_l = [OR_2J]iV [OR_2_l]2, 
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OR_4_2 = ([OR_2_1],a[OR_2_1]2) v [OR_2_2]i v [0R_2_2]2, 
OR_4_3=[OR_2_1],a[OR_2_1]2 a ([OR_2_2]i v [0R_2_2],), 
0R_4_4 = [OR_2_2]i ^ [OR_2_2]2. 

5 Block 3 is a circuit implementing logic of level three. The reductions afford the 
following expressions for functions 0R_8_1, OR_8_2, OR_8_3, OR_8_4, OR_8_5, 
OR_8_6, OR_8_7, and OR_8_8, 

0R_8_1 = [0R_4_l]i V [OR_4_l]2, 
OR_8_2 = ([OR_4_l],^[OR_4_l]2) v ([OR_4_2]i v [ORJJ-h), 
10 OR_8_3 = [([0R_4_1],^[0R_4_1L) ^ 

([OR_4_2], V [OR_4_2]2)] v [OR_4_3], v [OR_4_3]2, 
OR_8_4 = [([OR_4_l],^[OR_4_l]2) ^ ([OR_4_3], v [OR_4_3]2)] v 

([OR_4_2],^[OR_4_2]2) V [OR_4_4]i v [OR_4_4]2, 
OR_8_5 = [([OR_4_l]i^[OR_4_l]2) ^ ([OR_4_4]i v [OR_4_4]2)] v 
1 5 [([OR_4_2]i^[OR_4_2]2) ^ ([OR_4_3], v [OR_4_3]2)] , 

OR_8_6 = [([OR_4_2]i^[OR_4_2]2) ^ ([OR_4_4]i v [OR_4_4]2)] v 
([OR_4_3],^[OR_4_3]2), 
OR_8_7 = ([OR_4_3]i^[OR_4_3]2) ^ 
([OR_4_4], V [ORJJh), 
20 OR_8_8 = [OR_4_4]i ^ [OR_4_4]2. 

Block 4 is a circuit implementing logic for level 4. The implementation of functions 
OR_16_8, OR_16_4, and OR_16_12 follows reduced formulas, 

25 OR_16_4 = [([OR_8_l],^[OR_8_l]2) ^ ([OR_8_3]i v [OR_8_3]2)] v 

([OR_8_2],^[OR_8_2]2) v [OR_8_4], v [OR_8_4]2, 
OR_16_8 = [([OR_8_l]i^[OR_8_l]2) ^ ([OR_8_7]i v [OR_8_7]2)] v 
[([OR_8_2]i''[OR_8_2]2) ^ ([OR_8_6]i v [OR_8_6]2)] v 
[([OR_8_3]i^[OR_8_3]2) ^ ([OR_8_5], v [OR_8_5]2)] v 
30 ([OR_8_4],^[OR_8_4]2) v [OR_8_8], v [OR_8_8]2, 
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OR„16_12 - [([OR_8_4],^[OR„8_4]2) ^ ([OR_8_8]i v [OR_8_8L)] v 
([OR_8_6],^[OR_8_6]2) v [([OR_8_5]i^[OR_8_5]2) ^ ([OR_8_7], v [OR_8_7]2)] . 

The binary tree principle of this embodiment of the present invention can be 
5 implemented using either OR or EXOR symmetric functions. When using EXOR 
symmetric functions there is a reduction in logic which applies. Assume that k= Si^s^' 
where S is a set of natural numbers uniquely determined by k as a set of positions of 
ones in the binary representation of k. Then 

EXOR„n_k - ANDies EX0R_n_2\ 
10 Thus, designing a circuit computing EXOR_n_k, one gets away with computing only 
functions EX0R_n_2^ on subsets and thus although EXOR logic is slower, there is less 
fan-out than when using OR logic. 

As can be seen in figure 21, the most efficient circuit can comprise a mixture of OR and 
1 5 EXOR symmetric function logic circuits. 

Further reductions can be appUed to logic for a parallel counter based on OR elementary 
symmetric functions. For instance, the third significant bit admits the expression 
= OR_n_(2*+2*-^+2^-') v [^OR_n_(2^+2^-^) a OR_nJ2^+2^"')] v 
20 [^0R_n_2* A OR_n_(2^-^+2*-')] v [^0R__n_2*-^ a 0R_n_2*-']. 

The reduction can be stated more generally using the expression: 

Sk= {OR„n_2'^A^OR_n__2^^^} v {OR_n_2^^^+ 2^ a ^0R_n_2^^'} 
V {OR_n„2^^'+2^A^OR_n„2^^'+2^^'} 
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V OR_n_2'+ T-^+ 

where is the k* binary output, k=0 to t-1 and t is the number of outputs. 

The generation of an output bit below the most significant bit can be explained as 
30 at least one AND combination of the output of one symmetric function with an 
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inverted output of another symmetric function and OR combining the result of 
the AND combinations. 

Another important appUcation of reductions is logic for a conditional parallel counter. A 
5 conditional parallel counter is a module with n inputs. Let Q be a subset of {0,1.. .n} . 
The subset determines a condition. The module produces the binary representation of 
the number of high inputs if this number of high inputs belongs to Q. If the number of 
high inputs does not belong to Q, the outputs can be any logical function. Such a 
module can replace a parallel counter if the number of high inputs is in Q. 

10 

A useful conditional parallel counter has Q={04 . . .m} for some m<n. Logic for such a 
counter can be obtained from logic for a parallel counter with m inputs by replacing 
every OR_m_k with OR_n_k. For instance, if Q= {0,1, 2,3} then a conditional parallel 
counter has 2 outputs Si, Sq given by 
15 Si = 0R_n_2, So = EXOR_n_l. 

Another instance of a conditional parallel counter has Q= {0,1, 2,3, 4,5}, 

S2 = 0R_n_4, Si = ^0R__n_4 a OR_n_2, So = EXOR_n_l . 
If the number of high inputs for one of these two counters does not belong to Q then the 
output is the binary representation of the greatest element of Q, i.e., 3=1 1 or 5=101. 

20 

Although the previously described embodiment comprises a binary tree hierarchical 
arrangement of logic units, the present invention is appUcable to any hierarchical 
arrangement of logic units. The splitting of the inputs need not be on a binary basis and 
all inputs need not be input to logic units for performing small elementary symmetric 
25 functions. 

For example, figure 22 illustrates a hierarchical arrangement in which only four of the 
inputs are input to elementary logic units. Four of the inputs are only input into a logic 
unit at the third level of the hierarchy. Each logic unit at each level of the hierarchy 
30 comprises the logic of logic units at preceding levels. In this example, the logic units at 
level 2 on the left comprise the logic of two logic units at the first level and the logic 
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unit at level 3 on the left comprises two of the logic units at the second level. The logic 
unit at the third level on the right is not formed of sub units of logic in this embodiment. 
The logic unit at the fourth level comprises all of the logic units and comprises the logic 
circuit. 

5 

Figure 23 is another example illustrating the division of the logic circuit of figure 12 
into two hierarchical levels. In this example, one logic unit at the first level has five 
inputs and the other has two. The logic unit having five inputs comprises the logic for 
performing the 0R_5„1, OR_5__2, 0R_5 J , OR_5_4 and OR_5_5 symmetric functions 
10 as can be seen in figure 12. The logic unit having two inputs comprises the logic for 
performing the EX0R_7_1 symmetric function as can be seen in figure 12. The logic 
unit at the second level comprises the logic units at the first level and comprises the 
complete logic circuit. 

15 Figure 24 is a fiirther example illustrating the division of the logic circuit of figure 13 
into two hierarchical levels. In this example, one logic unit at the first level has four 
inputs and the other has three. The logic unit having four inputs comprises the logic for 
performing the 0R_4_1, OR_4_2, OR_4_3 and OR_4_4 symmetric fimctions as can be 
seen in figure 13. The logic unit having three inputs comprises the logic for performing 

20 the 0R_3_1 , OR_3_2 and OR_3_3 symmetric functions as can be seen in figure 1 3 . The 
logic unit at the second level comprises the logic units at the first level and comprises 
the complete logic circuit. 

During the design of the parallel counter it is possible to save logic by reusing fast logic 
25 units already available. There is a usefiil formula, 

0R_n_k(Xi...50 - -nOR_n_(n+l-k)(^Xi...^X,). 
Thus if a library contains a fast module generating OR__4_3 then this module can be 
used with inverters to generate OR_4_2. The opposite observation holds as well: an 
OR_4_2 module enables the generation of OR_4_3. 
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An embodiment that implements a transistor economical and high-speed realisation of 
threshold functions will now be described with reference to figures 25 to 28. In this 
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embodiment the threshold functions are implemented as elementary symmetric 
functions. 



It is generally known in electronics that AND-OR-INVERT gates are both economical 
5 in terms of the number of transistors to realize them and have good delay properties. 
Therefore this embodiment of the present invention utihses this to provide an 
economical and high-speed circuit design. 

As described above, it is known that : 

10 OR„n_k(Xi...XJ - ^OR_nJn+l-k)(^Xi,..^XJ (a) 

This leads to: 

^OR_n_k(Xi...XJ = OR_n_(n+l-k)(-nXi...^XJ (b) 

and 

OR_n„k(-nXi... ^X„) = OR„n_(n+l-k)(Xi...X,) (c) 

15 

To simpHfy notation, equation (a) can be written as: 

[n,k] = [ n.n+1-k r 

Equation (b) can be written as: 

[n,k]' - [ n.n4-l-k 1 

20 Equation (c) can be written as: 

[iLk] = [n,n+l-k]' 

where ' denotes an inversion of the outputs and denotes inverted inputs. 

Using these relationships, the circuit for the elementary symmetric function OR_8_4 
[8,4] can be constructed in a similar manner to the embodiment described hereinabove 

25 with reference to figure 14. Figure 25 illustrates the binary tree implementation of the 
elementary symmetric function OR_8_4 in which there are three levels of logic. The 
circuit receives inverted inputs Xl' . . .X8'. The logic units in the first layer comprise a 
NAND gate and NOR-gate as shown in figure 26. If the inputs to this logic unit are A' 
and B', where ' denotes the inverse, the outputs of this block are [2,1] = (A'aB')' and 

30 [2,2] = (AVE')' respectively. This logic circuit receives inverted inputs and implements 
an elementary symmetric function. 
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At the second level the outputs [2,l]a, [2,2]a, [2,l]b, [2,2]b from two consecutive first 

layer logic units are combined to derive the outputs [4.11 , [ 4,2] , [4,31 and [4,4]. To do 

this the following relationships are used: 
5 a4]-[44]' = ([2,l]a + [2,l]b)' 

[431 = [4,2]' = ([2,11a [2,l]b + [2,2]a + [2,2]b)' 

[4.21 = [4,31' = (Paia [2,21b + [2,21a [2,11b)' 
[4,4]' = ([2,21a [2,21b)' 

The logic circuit realizing these logic equations is shown in figure 27. This logic circuit 
1 0 receives non inverted inputs and implements the inverted elementary symmetric 

function. 

At the third level the outputs [4a]a . [4.21a. [4.3]a . [4.41a . [ 4,1 l b, [4,2]b, [4,31b and 
[4^4]b from two consecutive second layer logic units are combined to derive the output 
15 [8,41. 

The following relationship is used: 
[8,4]= IL5]: = aiM4^+I42]a[4^ 

The logic circuit realizing these logic equations is shown in figure 28. This logic circuit 
20 receives inverted inputs and implements an elementary symmetric function. 

It can thus be seen from figures 25 to 29 that symmetric ftmctions and inverted 
symmetric ftinctions are implemented at alternate levels of the hierarchical arrangement 
of logic units. If there are an odd number of levels, the inputs to the circuit must be 
25 inverted, as is the case for the illustrated example [8,4]. This use of inverted symmetric 
functions enables faster NAND gates to be used instead of AND gates (it should be 
noted that AND gates are implemented as a NAND gate with an inverter and thus the 
use of a NAND gate instead of an AND gate reduces the logic required to implement the 
function). The reduction of logic required also reduces the area of the logic circuit. 

30 

In this embodiment of the present invention, each level of the hierarchical logic 
structure includes inversion logic. 
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Although this technique has been illustrated with respect to the function [8,4], it will be 
apparent to a skilled person that the technique can be appHed to any size function. 
Further, although this embodiment of the present invention has been implemented as a 
5 binary tree structure, the technique can be used for any hierarchical structure of logic 
units. Also, although in this embodiment of the present invention inverted inputs are 
used, if the number of levels of logic units in the hierarchy is even, the inputs need not 
be inverted. Instead inverted symmetric fimctions at an even number of levels can be 
used. Inverted inputs are required for circuits having an odd number of levels 
10 performing inverted symmetric functions. 

An important application of conditional parallel counters is constant multipliers. A 
constant multiplier is a module whose inputs form binary representations of two 
numbers A, B, and outputs comprise the binary representation of the product A*B 
1 5 whenever A is a number that belongs to a set of allowed constants. Since constant 

multipliers are smaller and faster then multipliers, it is beneficial to use them whenever 
one can choose one multiplicand from the set of allowed constants. One can do it, for 
instance, designing a digital filter. 

20 Another aspect of the present invention comprises a technique for multiplication and 
this will be described hereinafter. 

Multiplication is a fundamental operation in digital circuits. Given two n-digit binary 
numbers 

25 A,.i2"-^+A,.22^-'+. . .+A,2+Ao and B,.i2"^+B,.22"-'+. . .+Bi2+Bo, 

their product 

has up to 2n digits. Logical circuits generating all as outputs generally follow the 
scheme in figure 14. Wallace has invented the first fast architecture for a multiplier, now 
30 called the Wallace-tree multipUer (Wallace, C. S., A Suggestion for a Fast Multiplier, 
IEEE Trans. Electron. Comput. EC-13: 14-17 (1964))(fhe content of which is hereby 
incorporated by reference). Dadda has investigated bit behaviour in a multiplier (L. 
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Dadda, Some Schemes for Parallel Multipliers, Alta Freq 34: 349-356 (1965)) (the 
content of which is hereby incorporated by reference). He has constructed a variety of 
multipliers and most multipliers follow Dadda' s scheme. 

5 Dadda' s multiplier uses the scheme in on figure 29. If inputs have 8 bits then 64 parallel 
AND gates generate an array shown in figure 30. The AND gate sign a is omitted for 
clarity so that A^ aBj becomes AiBj. The rest of figure 30 illustrates array reduction that 
involves full adders (FA) and half adders (HA), Bits from the same column are added by 
half adders or full adders. Some groups of bits fed into a full adder are in rectangles. 

10 Some groups of bits fed into a half adder are in ovals. The result of array reduction is 
just two binary numbers to be added at the last step. One adds these two numbers by one 
of fast addition schemes, for instance, conditional adder or carry-look-ahead adder. 

This aspect of the present invention comprises two preferred steps: array deformation 
1 5 and array reduction using the parallel counter in accordance with the first aspect of the 
present invention. 

The process of array deformation will now be described. 

20 Some parts of the multipHcation array, formed by A^j such as on figure 30, have 
interesting properties. One can write simple formulas for the sum of the bits in these 
parts. Examples of such special parts are on figure 3 1 . In general, chose an integer k, 
and those AjSj in the array such that the absolute value of i-j-k is less or equal to 1 
comprise a special part. 

25 

Let Si be the bits of the sum of all the bits of the form AiBj shown on figure 1 . Then 

So ~ ^ ^ -^o 
S, = (A,aBo)©(AoaB,), 
Sj = (A, A Bi) © (A, A Bi A Ao A Bo), 
30 S^k., = (Ak.i A Bk) © (Ak A B^,) © (A, a B, a A^., a B^.,) 

forallk>0. 
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S2k-(A,AB0©(A,.iAB,.iA 

((Ak+i A Bk^i) V (Ai,i A Bk.i A (Ak^i V Bt+0)) 
forallk>l. 

These formulas show that the logic for summing the chosen entries in the array does not 
5 get large. Whereas if random numbers were summed the logic for the (n + 1)"^ bit is 
larger than the logic for the n*^ bit. 

Using these formulas, one can generate a different array. The shape of array changes. 
This is why it is called array deformation. These formulas are important because one can 
10 speed up a multiplication circuit by generating an array of a particular shape. 

The array in figure 32 is for an 8-bit multiplication. The AND gate sign a is omitted for 
clarity so that A^aBj becomes A^Bj. Array deformation logic generates X, Y, and Z: 

X = (AiaB6)©(AoaB,X 
15 Y = Ai aB^a^CAqaB^), 

Z = Ai A B7 A Ao A Bg, 

The advantage of this array over one in figure 30 is that the maximal number of bits in a 
column is smaller. The array in figure 30 has a column with 8 bits. The array on figure 
32 has 4 columns with 7 bits but none with 8 or more bits. The logic for the generation 
20 of X Y and Z is illustrated in figure 33 . This logic can be used in parallel with the first 
two fiiU adders (illustrated in Figure 2) in the array reduction step thus avoiding delays 
caused by additional logic. 

Array reduction is illustrated in figure 32. The first step utihzes 1 half adder, 3 full 
25 adders, 1 parallel counter with 4 inputs, 2 parallel counters with 5 inputs, 1 parallel 

counter with 6 inputs, and 4 parallel counters with 7 inputs. The three parallel counters 
(in columns 7, 8, and 9) have an implementation based on 7=5+2 partition. The bits X, 
Y, and Z join the group of two in the partition. The counter in column 6 is implemented 
on 7=4+3 partition. The counter in column 5 is based on 6=3+3 partition. The remaining 
30 counters should not be partitioned. The locations of full adders are indicated by ovals. 
The half adder is shown by a rectangle. 
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An adder for adding the final two binary numbers is designed based on arrival time of 
bits in two numbers. This gives a slight advantage but it is based on common 
knowledge, that is conditional adder and ripple-carry adder. 

5 

Although in this embodiment the addition of two 8 bit numbers has been illustrated, the 
invention is applicable to any N bit binary number addition. For example for 16 bit 
addition, the array reduction will reduce the middle column height from 16 to 15 thus 
allowing two seven bit full adders to be used for the first layer to generate two 3 bit 
10 outputs and the left over input can be used with the other two 3 outputs as an input to a 
further seven input full adder thus allowing the addition of the 16 bits in only two 
layers. 

Although this embodiment of the present invention has been described with reference to 
15 the formation of the array by logical AND binary combination, this aspect of the present 
invention encompasses any method of forming the array including any method of 
logically combining bits of two binary numbers e.g. OR combining, EXOR combining 
and NAND combining and forming the array using Booth encoding. Further, the length 
of the two binary numbers need not be the same. 

20 

Although this aspect of present invention has been described with reference to a specific 
multiplication logic circuit, the present invention also applies to any logic circuit that 
performs multiplication including a multiply-accumulate logic circuit (which can be 
viewed as a special case of a multiplication logic circuit). In a multiply-accumulate 

25 logic circuit the operation AxB+C is implemented where C is the accumulation of 

previous multiplications. The multiply-accumulate logic circuit operates by generating 
the array of AxB as described hereinabove for the multiplication logic circuit. An 
additional row is added in the array for the bits of C. C can have many more bits than A 
or B due to previous accumulations. This enlarged array then undergoes array reduction 

30 as described hereinabove. 
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This aspect of the present invention can be used with the parallel counter of the first 
aspects of the present invention to provide a fast circuit. 

The parallel counter of the first aspects of the present invention has other applications, 
5 other than used in the multiplier of one aspect of the present invention. It can be used in 
RS A and reduced area multipliers. Sometimes, it is practical to build just a fragment of 
the multiplier. This can happen when the array is too large, for instance in RSA 
algorithms where multiplicands may have more than more than 1000 bits. This 
fragment of a multipHer is then used repeatedly to reduce the array. In current 
10 implementations, it consists of a collection of full adders. One can use 7 input parallel 
counters followed by full adders instead. 

A parallel counter can also be used in circuits for error correction codes. One can use a 
parallel counter to produce Hamming distance. This distance is useful in digital 
1 5 communication. In particular the Hamming distance has to be computed in certain types 
of decoders, for instance, the Viterbi decoder or majority-logic decoder. 

Given two binary messages (Ai, A2, ... AJ and (Bi, B2, ... BJ, the Hamming distance 
between them is the number of indices i between 1 and n such that A^ and B^ are 
20 different. This distance can be computed by a parallel counter whose n inputs are 

The multiply-and-add operation is fundamental in digital electronics because it includes 
filtering. Given 2n binary numbers X^, X^, ... X^, Yj, Y2, ... Y^, the result of this 
25 operation is 

X^Yi + X^Y^ + .-. + X^Y,. 
One can use the multiplier described to implement multiply-and-add in hardware. 
Another strategy can be to use the scheme in figure 29. All partial products in products 
XjYi generate an array. Then one uses the parallel counter X to reduce the array. 

30 

In the present invention, one can use the parallel counter whenever there is a need to add 
an array of numbers. For instance, multiplying negative number in two-complement 
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form, one generates a different array by either Booth recording (A. D. Booth, A Signed 
Binary Multiplication Technique, Q. J. Mech. Appl, Math. 4: 236-240 (1951)) (the 
content of which is hereby incorporated by reference) or another method. To obtain a 
product one adds this array of numbers. 

5 

Figure 34 illustrates an embodiment of another aspect of the present invention. This 
embodiment generates an output in accordance with a threshold function having a 
threshold of 2 and implemented as a binary tree. The circuit thus implements an 
elementary symmetric function for the generation of the output when the number of 
10 input that are high (k) is at least 2. 

The output is thus generated as the OR symmetric function: 

OR„4__2 = XiaX2 V XjaX3 V X1AX4 V X2AX3 v X^aX^ v X3AX4 

15 The circuit can thus act as a switch to provide an output when a certain number of inputs 
are high. The output can comprise any elementary symmetric function e.g. OR_n_k 
where n is the number of inputs and k is the number of high inputs. 

Although in this embodiment only one output is shown, the principles of this aspect of 
20 the present invention can be used to generate more than one output each being OR_n_k. 
For example, one output could be OR_4_2 and another OR_4_3. Thus the present 
invention encompasses logic circuits that provide outputs using threshold functions. 
This can be used for parallel counter outputs or for other logic circuits. 

25 Although the present invention has been described hereinabove with reference t specific 
embodiments, it will be apparent to a skilled person in the art that modifications lie 
within the spirit and scope of the present invention. 

The logic circuits of the embodiments of the present invention described hereinabove 
30 can be implemented in an integrated circuit, or in any digital electronic device. 
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